Goto

Collaborating Authors

 grammatical correctness


Sensitivity of Small Language Models to Fine-tuning Data Contamination

arXiv.org Artificial Intelligence

Small Language Models (SLMs) are increasingly being deployed in resource-constrained environments, yet their behavioral robustness to data contamination during instruction tuning remains poorly understood. We systematically investigate the contamination sensitivity of 23 SLMs (270M to 4B parameters) across multiple model families by measuring susceptibility to syntactic and semantic transformation types during instruction tuning: syntactic transformations (character and word reversal) and semantic transformations (irrelevant and counterfactual responses), each applied at contamination levels of 25\%, 50\%, 75\%, and 100\%. Our results reveal fundamental asymmetries in vulnerability patterns: syntactic transformations cause catastrophic performance degradation, with character reversal producing near-complete failure across all models regardless of size or family, while semantic transformations demonstrate distinct threshold behaviors and greater resilience in core linguistic capabilities. Critically, we discover a ``\textit{capability curse}" where larger, more capable models become more susceptible to learning semantic corruptions, effectively following harmful instructions more readily, while our analysis of base versus instruction-tuned variants reveals that alignment provides inconsistent robustness benefits, sometimes even reducing resilience. Our work establishes three core contributions: (1) empirical evidence of SLMs' disproportionate vulnerability to syntactic pattern contamination, (2) identification of asymmetric sensitivity patterns between syntactic and semantic transformations, and (3) systematic evaluation protocols for contamination robustness assessment. These findings have immediate deployment implications, suggesting that current robustness assumptions may not hold for smaller models and highlighting the need for contamination-aware training protocols.


Differentially-private text generation degrades output language quality

arXiv.org Artificial Intelligence

Ensuring user privacy by synthesizing data from large language models (LLMs) tuned under differential privacy (DP) has become popular recently. However, the impact of DP fine-tuned LLMs on the quality of the language and the utility of the texts they produce has not been investigated. In this work, we tune five LLMs with three corpora under four levels of privacy and assess the length, the grammatical correctness, and the lexical diversity of the text outputs they produce. We also probe the utility of the synthetic outputs in downstream classification tasks such as book genre recognition based on book descriptions and cause of death recognition based on verbal autopsies. The results indicate that LLMs tuned under stronger privacy constrains produce texts that are shorter by at least 77 %, that are less grammatically correct by at least 9 %, and are less diverse by at least 10 % in bi-gram diversity. Furthermore, the accuracy they reach in downstream classification tasks decreases, which might be detrimental to the usefulness of the generated synthetic data.


Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?

arXiv.org Artificial Intelligence

Small Language Models (SLMs) are generally considered to be more compact versions of large language models (LLMs), typically having fewer than 7 billion parameters. This study investigates the ability of small language models to learn, retain, and subsequently eliminate noise that is typically not found on the internet, where most pretraining datasets are sourced. For this, four pre-trained SLMs were utilized: Olmo 1B, Qwen1.5 1.8B, Gemma 2B, and Phi2 2.7B. The models were instruction-tuned without noise and tested for task execution with in-context learning. Afterward, noise patterns were introduced to evaluate the models' learning and unlearning capabilities. We evaluated the models' performance at various training levels. Phi consistently excelled with word-level noise but performed the worst with character-level noise. Despite being the smallest with approximately 1 billion parameters, Olmo performed consistently well on tasks.


Privacy Concerns in Chatbot Interactions: When to Trust and When to Worry

arXiv.org Artificial Intelligence

Through advances in their conversational abilities, chatbots have started to request and process an increasing variety of sensitive personal information. The accurate disclosure of sensitive information is essential where it is used to provide advice and support to users in the healthcare and finance sectors. In this study, we explore users' concerns regarding factors associated with the use of sensitive data by chatbot providers. We surveyed a representative sample of 491 British citizens. Our results show that the user concerns focus on deleting personal information and concerns about their data's inappropriate use. We also identified that individuals were concerned about losing control over their data after a conversation with conversational agents. We found no effect from a user's gender or education but did find an effect from the user's age, with those over 45 being more concerned than those under 45. We also considered the factors that engender trust in a chatbot. Our respondents' primary focus was on the chatbot's technical elements, with factors such as the response quality being identified as the most critical factor. We again found no effect from the user's gender or education level; however, when we considered some social factors (e.g. avatars or perceived 'friendliness'), we found those under 45 years old rated these as more important than those over 45. The paper concludes with a discussion of these results within the context of designing inclusive, digital systems that support a wide range of users.


Adapting a Language Model for Controlled Affective Text Generation

arXiv.org Artificial Intelligence

Human use language not just to convey information but also to express their inner feelings and mental states. In this work, we adapt the state-of-the-art language generation models to generate affective (emotional) text. We posit a model capable of generating affect-driven and topic focused sentences without losing grammatical correctness as the affect intensity increases. We propose to incorporate emotion as prior for the probabilistic state-of-the-art text generation model such as GPT-2. The model gives a user the flexibility to control the category and intensity of emotion as well as the topic of the generated text. Previous attempts at modelling fine-grained emotions fall out on grammatical correctness at extreme intensities, but our model is resilient to this and delivers robust results at all intensities. We conduct automated evaluations and human studies to test the performance of our model, and provide a detailed comparison of the results with other models. In all evaluations, our model outperforms existing affective text generation models.